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Growth charts are widely used in pediatric care for assessing 
childhood body size measurements (e.g., height or weight). The exist¬ 
ing growth charts screen one body size at a single given age. However, 
when a child has multiple measures over time and exhibits a growth 
path, how to assess those measures jointly in a rigorous and quanti¬ 
tative way remains largely undeveloped in the literature. In this pa¬ 
per, we develop a new method to construct growth charts for growth 
paths. A new estimation algorithm using alternating regressions is 
developed to obtain principal component representations of growth 
paths (sparse functional data). The new algorithm does not rely on 
strong distribution assumptions and is computationally robust and 
easily incorporates subject level covariates, such as parental informa¬ 
tion. Simulation studies are conducted to investigate the performance 
of our proposed method, including comparisons to existing methods. 

When the proposed method is applied to monitor the puberty growth 
among a group of Finnish teens, it yields interesting insights. 

1. Introduction. In pediatric practice, height, weight and other body 
size measurements are frequently examined for infants, children and ado¬ 
lescents in order to ensure their healthy growth. The most commonly used 
tools are growth charts, also known as reference centile charts. The funda¬ 
mental purpose of growth charts is to identify percentile ranks of individuals 
with respect to their corresponding reference populations, and to screen out 
subjects with extreme ranks, either too high or too low, for further medical 
investigations. The conventional growth charts consist of a series of per¬ 
centile curves for a certain measurement over ages. Those percentile curves 
are estimated from a reference population using penalized likelihood meth¬ 
ods introduced in Cole (1988) and Cole and Green (1992). They are used to 
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identify individual percentile ranks at specific ages. Lately, several methods, 
including Thompson and Fatti (1997), Scheike, Zhang and Juul (1999), Wei 
et al. (2006) and Chen and Muller (2012), were proposed to further incor¬ 
porate prior information and subject level covariates into growth charts. In 
these methods, the reference percentiles are estimated by conditioning on not 
only target ages but also prior measurements and other important variables, 
such as prognostic and parental information. The resulting growth charts are 
hence called conditional growth charts. Thompson and Fatti (1997) assumed 
a multivariate normal distribution for the measurements and the covariates 
at all time points and used the maximum likelihood estimator for the mean 
and variance functions. Scheike, Zhang and Juul (1999) considered a longi¬ 
tudinal regression model accounting for the previous measurement adjacent 
to the current measurement and the duration in between. To avoid a partic¬ 
ular distributional assumption, Wei et al. (2006) proposed a semi-parametric 
quantile regression model to construct conditional growth charts. 

Both conventional and conditional growth charts screen only one single 
measurement at a time. However, due to common clinical practice, each in¬ 
dividual has its measurements collected longitudinally and exhibits a growth 
path over time. A growth path may not be normal even if each of its mea¬ 
surements is within the normal ranges of both conventional and conditional 
growth charts. For example, as shown in Figure 1, this subject starts at 
the 90th percentile in weight at the age of 0.5, and gradually declines to 
the 15th percentile around the age of 2.5. Although such a slow decline in 
the growth path should be alerting, it cannot be recognized by conventional 
growth charts, because all its measurements are within the normal ranges. 



Age (years) 

Fig. 1. An example of an abnormal growth pattern. The dots represent the growth path 
for a subject. The curves are the percentile curves at guantile levels 0.05, 0.25, 0.50, 0.75 
and 0.95. The x axis represents ages. The y axis represents weights. 
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It cannot be detected by conditional growth charts either, since the changes 
from the preceding measurements are not large enough. 

Therefore, screening entire growth paths may bring new insights into 
growth screening. However, existing screening methods for growth paths 
are mostly empirical, relying heavily on personal experiences of medical 
providers [Legler and Rose (1998)]. Rigorous quantitative screening meth¬ 
ods for entire growth paths remain largely undeveloped. Hence, in this paper, 
we propose a new statistical method to construct growth charts that enable 
the screening of entire growth paths. 

Growth charts are estimated from a reference growth data set, which is 
collected from a representative sample in a target population, and consists 
of longitudinal body size measurements. Most reference growth data share 
the following characteristics. First, each growth path is only observed at 
sparse and irregularly spaced time points with possible measurement errors. 
Therefore, statistical tools developed for multivariate and functional data 
are directly applicable, as the former requires a fixed measurement schedule 
and the latter requires densely observed data on each growth path. Second, 
the distributions of body size measurements are unlikely to follow certain 
parametric distributions. Therefore, likelihood based parametric approaches 
are often undesirable in such applications. 

Considering the characteristics of reference growth data, we develop a 
two-step procedure for identifying percentile ranks of growth paths. In the 
first step, we propose a novel regression based principal component analysis 
(PCA) algorithm that is tailored specifically for reference growth data. In the 
second step, we construct the multivariate quantile contours of the resulting 
component scores, which can be used to identify percentile ranks of growth 
paths. The proposed PCA algorithm can also incorporate covariates, which 
in turn enables the screening of growth paths conditioned on individual 
characteristics. 

The rest of this paper is organized into the following structure: In Sec¬ 
tion 2 we elaborate on the proposed screening method, including the general 
model settings and notation in Section 2.1, the introduction of the proposed 
regression based principal component analysis in Section 2.2, the construc¬ 
tion of growth charts for screening growth paths in Section 2.3, and the 
extension of incorporating covariates in Section 2.4. In Section 3 we provide 
examples of applying the proposed method in the field of pediatrics. In Sec¬ 
tion 4 we present the numerical investigation of our method. In Section 5 
we include discussions and conclusions on the important findings. 

2. Methods. 

2.1. Settings and notation. A reference growth data set consists of N 
subjects and their longitudinal measurements Here 
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rrii is the number of measurements for the zth subject, and Yij is the jth 
observation for the ith subject measured at the time of Tij, Tij G T. We 
assume that each longitudinal growth path is observed from the following 
model: 

(1) Yij = Yi{Tij) + Ejj, Tij G T , 

where li(f)’s are the underlying growth paths, £ij's, and independent of 
hi(t)’s, are i.i.d. random errors with mean zero and constant variance cr^. 
£ij can be viewed as the measurement error associated with Y^j, and we 
implicitly assume that measurement errors do not depend on magnitudes of 
measurements and measurement times. Such assumptions are reasonable for 
reference growth data. For example, the weight measurement error due to 
a weight scale is usually related neither to the weight itself nor to the time 
when the weight is taken. 

By the Karhunen-Loeve theorem in Loeve (1978), the true growth paths 
Yi{t), if smooth and continuous, can be written as 

OO 

(2) Yi{t) = U{t) + y^^rik(j)k{t), 

k=l 

where U{t) = E{Yi{t)} is the population mean function, 4’k{t)'s are principal 
component functions, which are continuous pair-wise orthogonal functions 
on T with dt = 1, and r^^’s are principal component scores, which 

are uncorrelated random variables with mean 0 and variance Afc, where Ai > 
A 2 ,_This decomposition provides the basis of PC A for functional data. 

We further assume Yi{t) can be well approximated by the first K prin¬ 
cipal component functions, that is, Yi{t) « U{t) + This ap¬ 

proximation is biologically plausible, since the biological growth process is 
mainly driven by several growth hormones, as mentioned in Zhang (2012). As 
each growth hormone determines a particular growth pattern, the observed 
growth path is the result of their joint actions. Therefore, with the A:th com¬ 
ponent function representing a certain growth pattern, the component 
score Tik measures the extent to which (pk{t) contributes to the individual 
growth path Yi{t). The biological meaning of component functions (pkii) and 
scores rik is also exemplified in Section 3.1. This way, the distribution of the 
growth paths, li(t)’s, are fully determined by their component scores. Con¬ 
sequently, the growth charts for Yi{t) can be constructed based on the joint 
distribution of the first K component scores. To estimate the component 
functions of Yi{t) from the reference growth data, we proposed a regression 
based PC A algorithm in Section 2.2. 

The following notation will be used to illustrate our proposed method: 
L^(7~) is the set of square integrable functions defined on the time in¬ 
terval T. Denote || • |p as the norm for functions in L^(T), that is, 
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||/P= ^LP‘{T). The inner product of two functions/i(t) 

and / 2 (t) in L?'{T) is defined as (/i, / 2 ) = dt. When (/i, / 2 ) = 0, 

we say that fi{t) and / 2 (t) are orthogonal to each other, denoted as /i T / 2 . 

2.2. Regression based principal component analysis for growth data. Ref¬ 
erence growth data can be considered as sparse functional data due to the 
sparse and irregular data structure. There exists a few PCA methods for 
sparse functional data, including Yao, Muller and Wang (2005), James, 
Hastie and Sugar (2000) and Peng and Paul (2009). Yao, Muller and Wang 
(2005) involved the estimation of high-dimensional covariance matrices, as 
well as their inverses, which may not be computationally stable. James, 
Hastie and Sugar (2000) provided a stable maximum likelihood estimation 
(MLE) algorithm under the assumption of Gaussian process. Peng and Paul 
(2009) implemented the same model from James, Hastie and Sugar (2000) 
using an improved fitting procedure. However, the distribution assumption 
of the MLE methods may not be satisfied by reference growth data. In this 
section, we propose a regression based PCA algorithm which is computa¬ 
tionally stable, not relying on strong distribution assumptions, and easily 
incorporates covariates. Without loss of generality, and to simplify the no¬ 
tation, we assume in this section that the population mean U{t) in (2) is 0. 
Eor nonzero U{t), we can get its nonparametric estimation and subtract it 
from Yi(t). The algorithm can be applied to the remaining part as discussed 
in Remark 1. 

The proposed algorithm is based on the fact in Graves, Hooker and Ram¬ 
say (2009) that, given (pi{t), 1 <l < k, and rj^’s, the /cth component function 
(pkit) is the minimizer of the objection function 

(3) E\\Y^{t)-r^kMt)f, 

subject to the constraints that ||</>a:|P = 1 and Y < I < k. And given 
0fc(t), the component score is 

(4) nk = {Yi^cpk) = argmin||Yi(f) - r(t)k{t)\\^- 

r 

These optimizations provide a theoretical basis for estimating (pkit) and 
iteratively and sequentially. 

Naturally, a sample version of the objective function (3) can be con¬ 
structed by 

^ N mi 

■^Y/V X!/ X!/1 “ '<"ik4>k (Tij ) I . 

Li=l m-i i=l j=l 

Moreover, to estimate 0fc(t), we approximate it through B-spline approxi¬ 
mations, that is, there exists a ctk G such that (pkit) ~ '^(^)^Q!fc) where 
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7r(t) = {7ri(i),... are B-spline basis functions given the specific 

knots and order, de Boor (1978) showed that any smooth function can al¬ 
ways be well approximated by a B-spline representation with a sufficient 
number of knots. The selection of knots and order in practice is discussed in 
Remark 5. With the above approximations, we have the following working 
objective fnnction: 

^ N nii 

(cTfc, ^ ^ ^ ^ I k)j rif^7v(Tij) ock\ j 

Ei=i m 

(5) 

s.t. \\7r{t)'^ak\\‘^ = 1 and 7v{t)'^cxk T 7r{t)'^ cxi,\/l <l < k, 

where Rk = {rik, ■ ■ ■ ,rNk)'^ is the vector of the /cth component scores. 

In what follows, we present a sequential and iterative algorithm to esti¬ 
mate cxk and Rk in (5). Our proposed alogrithm is inspired by the iterative 
least square method in Wold (1966), which was used to conduct multivariate 
PCA. A similar algorithm in alignment with robust regressions was studied 
in Chen, He and Wei (2008). However, our algorithm is the first attempt to 
implement such an iterative algorithm in PCA for sparse functional data. 


Estimating the 1st component. The algorithm starts with estimating the 

1st component (q:i,Ri). We use and R^^'^ for the estimates of cxi and 
Ri at the z^th iteration. The algorithm includes the following steps: 


Step 1: Initial values. Generate Ri with each of its elements following 

uniform (0,1) distribution and denote it as 

Step 2: Alternating regressions. Continue from the i^th iteration step with 


R^^\ We obtain by 


/r‘\ C^Al) 

(6) Qj = arg mm 


N mi 


«eK% ^ 




and then standardize cx 


(^+1) 


1 


by 




. The resulting satis- 


fies ||7r(t)^Q:^^'*'^^ Ip = 1. Next we update the component scores by 


(7) 


(u+l) 


- r7r{Tijf<x^l 


r^(i'+i)|2 


i = l,2,, 


.N. 


i=i 


Here (7) involves N separate regressions. Continue iterations until the fol¬ 
lowing two conditions are satisfied: 

1. The differences of R^^^ and and are less than some 

small value di for all their elements; 
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2. The change in the objective function 0^2 (cti, i?i) between two consec¬ 
utive iterations does not exceed a small value 62 - 

Step 3: Solutions. We denote the resulting estimates from step 2 as the 
Si and Ri, which are the estimates for cii and Ri- 

It is easy to see that the objective function Di 2 {cxi,Ri) is monotonically 
nonincreasing at each iterative step, and the algorithm will converge to a 
local minimizer. 


Estimating the kth component with k > 1. When we move to the kth 
component {ak,Rk) with A: > 1, we need to solve the constrained objective 
function (5). A numerical algorithm directly incorporating such constraints 

is not straightforward. However, if subtracting Yli=i from Yij, 

(k—1) 

and denoting the resulting residuals as Qj , we then have the following 
alternative but equivalent objective function: 


( 8 ) 


N rrii 


^i=l i=\ j= 




subject to the only constraint ||7r(t)^Q:fc|| = 1. The equivalence between (8) 
and (5) comes from the fact that the component function is also the 

minimizer of E\\Y^^ , where Y^^ is Yi(t) — 

The new objective function (8) of {ak,Rk) is the same 
in format as the one for (Q;i,i?i). Therefore, estimating {cxk,Rk) can be 
achieved in a similar fashion as (Q:i,i?i). The only difference is at each 
iteration step, we need to orthogonalize 7r'^{t)ak against the previously es¬ 
timated {t)ai,'il < k to further improve the computational stability. The 
numerical details of orthogonalization are provided in Remark 4. When the 
observations of the growth paths are sufficiently dense, the orthogonality 
holds automatically without the orthogonalization step. The convergence 
and nonincreasing property also hold for each k. The R program for the 
proposed algorithm is provided in the supplemental documents Zhang and 
Wei (2015). 

At last, to determine the number of necessary components K, we propose 
a model adequacy measure that is an analog of R^ from Croux et al. (2003). 
It measures the total variability explained by the first K components, that 
is. 


(9) 


R^{K) = 1 - 


Eh ETMYy - Ek=i r^kT^m.r^ikr 

E N ^nii y'2 
2=1 2-jj=l ^ ij 


We stop the estimation algorithm when R?‘{K) is sufficiently large. The PCA 
approximation of Yi{t) can be returned as Yi{t) = Ek=i^ik'^{h^k- 
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Remark 1. The above estimation algorithm assumes that U{t) = 0, 
hence, one needs to properly center the growth paths hi(t)’s before using the 
algorithm. We propose to estimate the mean function U{t) = E{Y(t)} using 
nonparametric methods, such as B-spline smoothing and local polynomial 
smoothing, which provide uniform consistent estimators of the population 
mean as shown in Hansen (2008), de Boor (1978) and Fan and Gijbels (1996). 

Therefore, the algorithm can be applied to centered data Y*j = Yij — U{Tij), 

where U{t) is the estimate of U{t). Here Y*- are asymptotically equivalent 
to the truly centered data as proved in Han and him (2010). 

Remark 2. In step 2 of our proposed algorithm, we standardize by 

, ^ in each iteration. The standardization step is to meet the con- 

yi|,rORaM||2 

straint that ||7r(t)^Q!fc|p = 1. It does not alter the value of objection function 
Di 2 {ak, Rk) since VikOc^ = rikcc~^cxj^ for any nonzero real number c. 

Remark 3. The proposed algorithm can also be used to obtain singular 
value decomposition of functional data. Let Y(t) = {Yi(t),..., Y^it)}'^, R = 
{Ri, R 2 , ■ ■ ■), and <h(t) = {(^i(t), ()) 2 (t), ■ ■ •}, then the decomposition (2) can 
be written as Y(t) = R<I>(t). If we further decompose R = UD, where D is 
a diagonal matrix, we yield the singular value decomposition for Y(t), that 
is, Y(t) = UD^{t). This step can be easily incorporated to the algorithm, 
but further decompositions of R are out of interest in our context. 

Remark 4. Let W = f 7r{t)7v{t)"^dt, where 7r(t) = {Tri{t), ... 
are the given B-spline basis functions. W is a In x matrix. Each element 
of W is the inner product of two basis functions, which can be calculated 
from numerical integrations. Since W is a positive-definite matrix, it can be 
decomposed as the cross-product of In this way, 7v{t)^ak T 7 v{t)^cxi is 

equivalent to (W^/^Q;fc)^(W^/^Q;;) =0. The orthogonalization of 
against can be achieved through Gram-Schmidt orthonor¬ 
malization from Trefethen and Bau (1997), which projects into 

the orthogonal space spanned by obtains the projection as 

and hence has as the orthogonalized otk- In each of the 

iterative steps, we implement such orthogonalization to update (x^k\ which 
makes the final solution of satisfy iv{t)'^ak T Tv(t)'^ai,l < k. 

Remark 5. In practice, we choose the knots of B-spline basis func¬ 
tions to be O' — 1 equally spaced quantiles of pooled time points, that is, 
|, I • • • ^^th quantiles. In this way, the B-spline basis functions are deter¬ 
mined by q and order. Since there are only two parameters, it is straightfor¬ 
ward to choose them by 5-fold cross-validation using AIG or BIG criterion. 
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Based on our numerical experience, the results are not sensitive to the exact 
locations of knots. 

Remark 6. The proposed algorithm has a lack of consistency of re¬ 
sults for the estimated principal component functions and scores under the 
sparsity setting in this paper. A weak asymptotic result for the principal 
component functions under restrictive assumptions exists. 

2.3. The construction of growth charts for growth paths. Through 
the proposed PCA algorithm, we can approximate Yi{t) as U{t) + 
fjfc7r(t)^Sfc. Hence, the percentile ranks of Yi{t) can be identified by 
estimating the multivariate quantiles of (fji, ■.. Multivariate quan¬ 

tiles consider the joint distribution of components scores and bring addi¬ 
tional insights in screening growth patterns. The individual percentile ranks 
determined by component scores enable the comparisons among subjects, 
which can be useful for pediatric practice. For example, subject A is at the 
95th percentile and subject B is at the 97th percentile. Using the percentile 
ranks, a pediatrician can prioritize the work by examining the health status 
of subject B first, since subject B is more likely to have health issues given 
its higher percentile rank. 

Due to the lack of natural ordering in a multidimensional space, there is no 
universally preferred definition of multivariate quantiles, but various ideas 
have been developed in the literature. For example, Liu, Parelius and Singh 
(1999) and Zuo and Serfling (2000) used multivariate quantile functions 
based on the half-space depth functions. Other approaches have been given 
by Parzen (1979), Abdous and Theodorescu (1992), Hettmansperger, Ny- 
blom and Oja (1992), Chaudhuri (1996), Koltchinskii (1997), Chakraborty 
(2003), McDermott and Lin (2007) and Wei (2008). Serfling (2002) presented 
a nice survey of multivariate quantile functions and outlined the probabilistic 
properties that a multivariate quantile function should have. 

In our case, the joint distribution of (fii,... ,riK) is unlikely to follow a 
certain parametric distribution due to the complexity of sparse functional 
data. Therefore, we propose to determine their multivariate quantiles non- 
parametrically using Wei (2008), since this method is also motivated from 
growth chart problems, and measuring the spatial “outlyingness” of an ob¬ 
servation relative to a center, which is the essential part of growth chart 
studies. Wei (2008) converts the component scores into the polar coordinate 
system and builds the quantile contours by nonparametrically regressing 
the radiuses with respect to the angles at various quantile levels. Then, by 
building a sequence of nested multivariate quantile contours of the K com¬ 
ponent scores, our growth chart can be constructed and used to determine 
the percentile ranks of growth paths. 
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Suppose we want to use our constructed growth chart to screen a growth 
path of a new subject, including m* observed measurements, 

We first obtain its component scores {r*i,... by the following least 

square regression: 


( 10 ) 


1 


m* 

mm - > 

J=1 


K 


Y^j -U{T^j) - 'y^rk^k{T*j) 


k=l 


where U{t) and are estimated from the reference growth data. By 

the estimated component scores, this subject can then be located on the 
constructed growth chart. If it stays outside an extreme quantile contour, 
such as the 0.95th quantile, we say that its growth path is more unusual 
than at least 95% of its peers, hence it can be singled out for further clinical 
investigations. 


2.4. Incorporating covariate effects. Since incorporating subject level in¬ 
formation, such as parental information and ethnicity, can enhance screen¬ 
ing performance, we extend our proposed method to include a covariate X. 
Suppose the reference growth data consist of {{Yij,Tij,Xi),i = 1,..., N,j = 
1,..., TTij}, where Xi is the covariate of the ith subject. We assume that the 
measurement Yij is observed from 

Yij = Yi {Tij , XiEij, 

where Yi{t,x) is the underlying growth path for the ith subject, and de¬ 
pends on both age t and covariate x. By extending the Karhunen-Loeve 
decomposition, we can write 

OO 

(11) Yi{t,x) = U{t,x) + '^rik(t)k{t,x), t£T, 

k=l 

where U{t,x) is the mean function, (pkit,x)’s are pair-wise orthogonal com¬ 
ponent functions, and r^ffs are individual component scores with respect 
to (j)k{t,x). Following similar ideas in Section 2.2, we extend the working 
objective function (5) as follows: 

^ N mi 

(12) DffrikiCXk) = -^ - '^'Y]\Yij - rikTT{Tij)'^c^kIi{Xi)\^ s.t., 

Ei=i mi 

(13) j {7r(t)'^Q:fe/x(x)}^ dt = 1; 

(14) J {7v{t)'^cxk7r{x)}{n'^{t)aifx{x)}dt = 0 \/l<l<k. 
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Here OLkiJL{x) provides the approximation of cj)k{t,x), where 7r(t) is the 
B-spline basis functions for t as in Section 2.2, ^i{x) = {fj,i{x),... 
is a set of covariate functions, and cx^ becomes a ijsf x ix matrix instead of a 
vector. The simplest choice of covariate functions fJ.{x) is (l,x)^, which im¬ 
plicitly assumes the component functions are linear in x for any given t. If the 
linearity assumption does not hold, one could consider including quadratic 
terms of x or even choosing fi{x) as B-spline basis functions to avoid any 
parametric assumption. Since Tv{t)'^cXkfJ-{x) is still a linear function of a^, we 
can implement the similar iterative algorithm in Section 2.2 by alternatively 
updating cxk and The major differences in each iteration come from the 
standardization and orthogonalization of 7r(t)^Q;fc/i(x) in order to meet con¬ 
straints (13) and (14), details of which are provided in Zhang (2012). Simi¬ 
larly, the covariate adjusted algorithm is conducted sequentially, and stopped 
when reaching an appropriate number of components K, which is determined 


by the extended R^, that is, 1 — 


E N 
i=l 


. Then 


the underlying growth path Yi(t,Xi) can be well approximated by the first 
several component functions, and hence determined by its component scores. 
Therefore, the growth chart for screening growth path can be constructed 
and implemented in a similar fashion as the one described in Section 2.3. 


3. Application examples. 

3.1. Growth charts for screening pubertal growth paths. In this section 
we illustrate our proposed screening method using part of a Finnish national 
growth data set from Pere (2000). The data consist of longitudinal height 
measures of 553 girls (ages 9-16) and 518 boys (ages 11-19) during puberty, 
as shown in Figure 2. The median number of measurements for each subject 


Girls (9-16) 



Boys (11-19) 



Fig. 2. Part of a Finnish national growth data from Pere (2000). The data include the 
longitudinal height measurements for 553 girls (left) from ages 9 to 16 and 518 hoys (right) 
from ages 11 to 19. The y-axis is height and the x-axis is age. The dots are the observed 
height measurements. 
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Age(Time) Age(Time) 

(a) (b) 

Fig. 3. The estimated first two component functions (j>i{t) (a) and (j> 2 {t) (b) for girls. 


is 6. The analysis is stratified by gender. We apply the proposed regression 
based PCA using quadratic B-splines with internal knots 11 and 13.56. The 
resulting first two component functions are plotted in Figure 3 for girls and 
Figure 4 for boys. In both cases, they count for 90% variability of the growth 
paths based on the proposed measure (9). 

In both genders, we find that the first component function reflects 
the overall growth scale, while the second one 4 > 2 {t) coincides well with the 
puberty growth velocity pattern. The second component function increases 
rapidly starting around age 11 and stabilizes after age 15 for girls [Fig- 




Fig. 4. The estimated first two component functions 4>i{t) (a) and (j) 2 {t) (b) for boys. 
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Fig. 5. The bivariate plot of the first two component scores for girls (a) and boys (b). The 
X axis represents the first component score and the y axis represents the second component 
score. The contours from inside to outside are the bivariate quantile contours at quantile 
levels 0.5, 0.75 and 0.95. The points labeled “A” and “B” in (a) are two selected girls 
whose first two component scores fall outside the 0.95th quantile contour, (a) The growth 
chart for girls, (b) The growth chart for boys. 


ure 3(b)], while a similar patten is found between age 14 and age 18 for 
boys [Figure 3(b)]. This difference in 4 >2{'t) is biologically reasonable since 
the puberty of boys begins later than girls. Therefore, the corresponding 
principal component scores have a nice biological interpretation. A subject 
with a higher rn tends to be taller than most of his or her peers, while a 
subject with a higher rj 2 may experience rapid pubertal growth. The growth 
charts are constructed based on the first two component scores, as shown 
in Figure 5(a) for girls and Figure 5(b) for boys. Such charts provide a 
convenient visual tool for screening potentially unusual growth patterns. In 
both figures, the x axis represents the first component score and the y axis 
represents the second ones. Bivariate quantile contours at quantile levels 
0.5, 0.75 and 0.95 are added to determine the individual percentile ranks. 
The individuals staying outside the 0.95th quantile contour have more out¬ 
lying component scores than at least 95% of their peers. Hence, they will be 
screened out for further clinical investigations. 

To illustrate the screening performance of our constructed growth charts, 
we select two girls, A and B, who are outside the 0.95th quantile contour 
in Figure 5(a), and further examine their growth paths as shown in Fig¬ 
ure 6. In Figure 6, the black dots are the original height measurements, and 
the dashed lines are the estimated underlying growth path Yi{t). The gray 
curves in the background are all the growth paths from the data. According 
to Figure 5(a), girl A has small component scores in both directions, while 
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Subject A Subject B 



Age Age 

R=(-61.9-7.4) R=(-0.1-13.1) 


(a) (b) 

Fig. 6. The observed growth paths of two extreme girls, girl A (a) and girl B (b) in 
Figure 5(a). The black dots are the original height measurements, and the dashed lines are 
the estimated growth paths. The gray background curves are all the growth paths from the 
Finnish growth data for girls. 


girl B has an average first component score, but a very low second compo¬ 
nent score. Consequently, as shown in Figure 5, girl A is shorter and slower 
than most of her peers; girl B has normative height, but apparently fails to 
gain enough height during her puberty. In both cases, the unusual growth 
patterns detected by our proposed growth charts are confirmed by empirical 
observations of the growth paths. 

Comparison to existing growth charts. As we illustrate in the Introduction, 
screening entire growth paths may bring new insights in monitoring human 
growth. The outlying girl C in Figure 5(a) is one example. Figure 7 provides 
the observed growth path of girl C. Her height starts around the median at 
the age of 9 and gradually increases to the upper percentile by the age of 
16. 

We first screen each of her measurements (black dots) using conventional 
growth charts and conditional growth charts. Specifically, following conven¬ 
tional growth charts from Wei et al. (2006), we estimate the 0.025th and 
0.975th percentiles that are conditioned only on her ages (squares in Fig¬ 
ure 7). And following conditional growth charts from Wei et al. (2006), we 
estimate the same reference percentiles conditioned on both her ages and 
prior measurements (triangles in Figure 7). 

As shown in Figure 7, all of her height measurements are within the 
normal ranges of both conventional and conditional growth charts. There¬ 
fore, when these two growth charts are used to screen her height one at a 
time, each of her height measurements is considered as normative. However, 
when we screen her entire growth path using the proposed method, girl C 
is screened out by the 0.95th quantile contour since her second component 
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Subject C 



Age 

R=( 10.3,9.7) 


Fig. 7. The observed growth path of one extreme girl (girl C) in Figure 5(a). The black 
dots are the original height measurements and the dashed line is the estimated growth 
path. The gray background curves are all the growth paths from the Finnish growth data 
for girls. The squares are the estimated 0.975th (open squares) and 0.025th (solid squares) 
quantiles from the unconditional growth chart. The triangles are the estimated 0.975th 
(open squares) and 0.025th (solid squares) quantiles from the conditional growth chart. 


score appears unusually large. It is consistent with the fact that she has been 
growing fast consecutively over her entire puberty. This example shows that 
the proposed method provides informative insights on growth pattern by 
considering entire paths. 

3.2. Growth charts conditioned on mother’s height. Parental heights usu¬ 
ally have strong associations with their children’s growth. In this section 
we incorporate mother’s height into the model and examine the pubertal 
growth of the Finnish teenage girls. The data set used here is a subset of 
girls’ data in Section 3.1, including 444 girls with mother’s height informa¬ 
tion available and at least 5 measurements between ages 9 and 16. To make 
the comparisons, we apply our proposed method, both with covariate and 
without covariate, to the data. We choose /x(x) to be Ijl{x) = (l,a:)^. Under 
this parameterization, U{t,x) = Ui{t) + xU 2 {t), (j)i{t,x) = 4>ii{t) + x4>i2{t), 
and (p 2 {t,x) = (p 2 i{t) + x4>22(t). The unknown functions Ui{t), U 2 (t), 4>ii{t), 
012 (t), 021 (^) and 022 (i) are all approximated using quadric B-splines with 
internal knots equal to 1/3 and 2/3 quantiles of pooled times. 

We use a bootstrap to test whether the covariate associated functions 
U 2 {t), 012 (i) and 022 (i) are equal to zero at any t, which is essentially 
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(a) 


(b) 


(c) 


Fig. 8. (a) The estimated mean functions from the covariate adjusted model (dashed 

lines) and the model without covariate (solid line). The dashed lines are the estimated 
mean functions conditioned on six different mother’s heights. The lines from the lightest 
gray to the darkest gray represent 150 cm, 155 cm, 160 cm, 165 cm, 170 cm and 175 
cm, respectively, (b), (c) The estimated first two component functions from the covariate 
adjusted model (dashed lines) and the model without covariate (solid lines), (a) Estimated 
location functions, (b) 4>i{t) for girls, (c) 02 (i) for girls. 


testing whether the corresponding B-spline coefficients are equal to 0. More 
details can be found in Zhang (2012). The resulting p-values indicate that the 
mother’s height is significantly related to U{t,x) (p-value < 0.0001), while 
and 4>2{'t,x) are insignificant (p-values equal to 0.72 and 0.59). We 
hence simplify the covariate adjusted model to 

Yi{t,Xi) ^ Ui{t) + xU2{t) + riicj)u{t) + ri2(j)2i{t). 

In Figure 8(a), the solid line is the estimated mean function without con¬ 
sidering mother’s height, and the dash lines are the expected growth paths 
conditioned on six different mother’s heights which are 150 cm, 155 cm, 160 
cm, 165 cm, 170 cm and 175 cm (from darkest gray to the lightest grey), 
respectively. Covariate adjusted mean functions show that with the increase 
of mother’s height, the expected body sizes and growth rates both tend to 
increase as well. We also observe the expected growth path conditioned on 
160 cm is close to the expected growth path of the whole population. The ex¬ 
planation is that the average of mother’s height in this data set is 161.6 cm, 
which is close to 160 cm. As shown in Figures 8(b)~(c), the estimated com¬ 
ponent functions from both models are very close to each other. However, 
due to the difference in the mean functions, the distributions of individual 
component scores are fairly different between the two models. Figure 9 plots 
the bivariate quantile contours estimated from two sets of component scores. 
We say that Figure 9(a) is the covariate adjusted growth chart for puberty 
growth paths and Figure 9(b) is the marginal one. 

Two girls, D and E, are selected from the sample and placed against the 
two growth charts. The growth path of girl D is considered as unusual in the 
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Model 1 

The growth chart for girls with mother's height adjusted 



(a) 


Model 2 

The growth chart for girls with no covariate adjusted 



(b) 


Fig. 9. The bivariate plots of the first two component scores for the covariate adjusted 
model (a) and the model without covariate (b). The x axis represents the first component 
score and the y axis represents the second component score. The contours from inside to 
outside are the bivariate quantile contours at quantile levels 0.5, 0.75 and 0.95. 


marginal growth chart, but not in the covariate adjusted one. In contrast, 
the growth path of girl E is only considered as unusual in the covariate ad¬ 
justed growth chart, but not in the marginal one. Figures 10 and 11 provide 
their growth paths (black solid lines and dots) for further investigations. 


Subject D 


Mother's height = 155cm 



Subject D 


Mother's height = 155cm 



(b) 


Fig. 10. The observed growth path of girl D in bivariate plots Figure 9. The black dots 
are the original height measurements. The gray background curves in (a) are all the growth 
paths from this data set. The gray background curves in (a) are the growth paths of the 
individuals with mother’s height from 153 cm to 155 cm. 
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Subject E Subject E 

Mother's height = 156cm Mother's height = 156cm 



Age Age 

R=(-3.8,8.3) R=(2.9,9.7) 

(a) (b) 

Fig. 11. The observed growth path of girl E in bivariate plots Figure 9. The black dots 
are the original height measurements. The gray background curves in (a) are all the growth 
paths from this data set. The gray background curves in (a) are the growth paths of the 
individuals with mother’s height from 154 cm to 158 cm. 


In Figure 10(a), we compare the target paths to all the growth paths in 
the sample (gray curves), while in Figures 10(b), we compare them only to 
those (gray curves) who have similar mother’s heights (±2cm). We find that 
girl D has grown unusually slow from ages 12 to 16 compared to others in 
the entire sample. That explains why girl D has an unusually low second 
component score in the marginal growth chart. However, if one restricts to 
those whose mothers have heights around 155 cm, her slow puberty growth 
is less extreme, as we observe more similar slow growth patterns in this sub¬ 
set. Subject E has normative body sizes and growth rates according to the 
marginal growth chart, but has excessive growth based on the covariate ad¬ 
justed chart. Examining her growth path in Eigure 11, we find that she has 
consecutive years of fast growth from ages 12 to 15. This fast growth appears 
to be more extreme when being compared to those whose mothers have sim¬ 
ilar heights. In this case, we would have missed the excessive growth of girl 
E if we did not take her mother’s height into consideration. These exam¬ 
ples show that incorporating subject level information, especially parental 
information, might lead to improvements in screening growth paths. 

4. Numerical investigations. 

4.1. Finite sample performance. In this section we present a numerical 
simulation study to illustrate the hnite sample performance of the proposed 
PCA method in comparison to the alternative Yao, Miiller and Wang (2005) 
and MLE methods. Eor MLE methods, we use the fpca R package based 
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on Peng and Paul (2009) since it provided an improved fitting of James, 
Hastie and Sugar (2000). We consider the following model to generate the 
simulation data: 

Yij = YiiTij) = U [Tij) + rii(f>i{Tij) + ri24>2{Tij) + eij, 

where (/>i(t), </> 2 (t) and U{t) are chosen to be the estimated functions for girls 
in Section 3.1. We consider the following two distributions for (rji,ri 2 ). In 
setting 1, we generate them from the empirical distribution of the estimated 
hrst two component scores for girls in Section 3.1. In setting 2, we generate 
them from a bivariate normal distribution with sample means and covariance 
estimated from the first two component scores for girls in Section 3.1. Both 
settings try to mimic growth paths of the Finnish data for girls, while a 
more restrictive parametric assumption is made in setting 2. For each of 
the above two settings, we generate 20 Monte Carlo samples. Each sample 
includes N = 500 random curves. Each one consists of m* = 6 observations 
with the observed time Tij uniformly distributed on [9,16]. 

Eor each sample, we use the proposed method, Yao, Muller and Wang 
(2005), and the MLE method to conduct PCA. We first estimate U{t) using 
nonparametric regression and then apply the three methods to the centered 
data Y*j = Yij — U{Tij) to estimate component functions. The selection of 
tuning parameters for all three algorithms is described as the following. Be¬ 
cause both our method and the MLE method from Peng and Paul (2011) 
use B-spline functions to represent component functions, we choose the same 
set of basis functions for both methods, that is, the quadratic B-spline basis 
functions with the 1 /3th and 2/3th quantiles of the pooled times as the in¬ 
ternal knots. Yao, Muller and Wang (2005) relied on estimating the variance 
and covariance by two-dimensional local polynomial smoothing. Its smooth¬ 
ing parameters are determined bj minimizing the AIC type criterion, that is, 
N X log{;^ ~ where p is the number of param¬ 

eters and Yij is the predicted Yij. All codes for the simulations are written 
in R language and run under R version 3.0.0 on a machine with Intel(R) 
Xeon(R), CPU 3.20 GHz and 16 GB RAM. On average, the running time 
to conduct PCA for one Monte Carlo sample is 17 seconds for our proposed 
method, 18 seconds for the MLE method, and 30 seconds for Yao, Muller 
and Wang (2005). 

To evaluate the estimation performance of the three methods, we calculate 
relative integrate squares errors (RISE) for both and /> 2 (t), where RISE 

for estimating a target function g{t) is defined as 

estimate. RISE can be considered as noise to signal measurements. The 
integrations in RISE are evaluated using the left Riemann sum [Thomas, 
Einney and Weir (1988)] with the equal partition of the whole interval into 
100 small intervals. Table 1 provides the summary of RISEs under both 
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Table 1 

The summary of RISEs for the three sparse functional PC A methods 


Means (standard deviations) of RISE 


Yao et al. (2005) The MLE method The proposed method 


Setting 1: (rii,ri 2 ) ~ Empirical distribution 

RISE of (/)i(t) 0.0061 (0.0017) 0.0003 (0.0003) 

RISE of (ji 2 (t) 0.0955 (0.0545) 0.0022 (0.0009) 

Setting 2: (rii,ri 2 ) ~ Bivariate normal distribution 
RISE of (/)i(t) 0.0052 (0.0018) 0.0003 (0.0003) 

RISE of (^ 2 (t) 0.1076 (0.0872) 0.0023 (0.0012) 


0.0004 (0.0005) 
0.0020 (0.0015) 


0.0004 (0.0003) 
0.0027 (0.0014) 


settings. As shown in Table 1, all three methods perform well in estimating 
component functions, although Yao, Muller and Wang (2005) have slightly 
larger means and standard deviations. 

We further evaluate the estimation errors of component scores among 
the three methods. For each Monte Carlo sample, we calculate relative mean 

square error (RMSE), defined as > where is the estimator 

of Tjfc and s^{riy) is the sample variance of rj^. RMSE measures the fraction 
of variance unexplained caused by estimation errors. Yao, Muller and Wang 
(2005) involve the estimation of the individual covariance matrix and its 
inverse, which can be singular or close to singular. When it happens, it can 
deviate the estimation of component scores To make a fair comparison, 
we exclude the top 5% extreme square errors in the calculation of RMSE for 
Yao, Muller and Wang (2005). RMSEs under both settings are summarized 
in Table 2. All three methods work well for the 1st component with average 
RMSEs less than 5%. Eor the 2nd component scores, the average RMSEs 
for both our proposed method and the MLE method increase but still less 
than 20%, while the RMSEs for Yao, Muller and Wang (2005) tend to be 
slightly larger. 

4.2. Screening power. To illustrate how sensitive the proposed method 
is in identifying outlying growth paths compared to the conventional and 
conditional growth charts, we simulate Monte Carlo samples from setting 
1 as the reference growth data and build the three types of growth charts 
accordingly. We then simulate outlying growth paths Zi{t) from Zi{t) = 
Yi{t) + A[t — 9) + B. Here Yi{t) follow the correct model from setting 1, 
and A[t — 9) + B is a linear contaminated term, where A provides the 
slope deviation and B represents the location shift. We choose A from 
(-4, -2, -1,0,1,2,4) and B from (-20, -12, -4,0,4,12,20). Eor each {A, B) 
combination, we generate 100 curves with 6 observations Zij = Zi[Tij) + £ij 
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Table 2 n ~ 2 

The summary of relative mean square errors (RMSE) for the three 

sparse functional PCA methods 



Means (standard deviations) of RMSE 

Yao et al. (2005)* 

The MLE method 

The proposed method 

Setting 1: {rii,ri 2 ) 

~ Empirical distribution 


RMSE of m 

0.05 (0.04) 

0.01 (0.01) 

0.02 (0.01) 

RMSE of ri 2 

0.69 (0.47) 

0.13 (0.02) 

0.17 (0.03) 

Setting 2: {rii,ri 2 ) 

~ Bivariate normal distribution 


RMSE of ni 

0.07 (0.05) 

0.01 (0.01) 

0.02 (0.01) 

RMSE of ri 2 

0.87 (0.63) 

0.14 (0.03) 

0.17 (0.04) 


*Note: Yao et al. (2005) involve the estimation of the individual covariance matrix and its 
inverse, which can be singular or close to singular. When it happens, it can deviate the 
estimation of component scores Vik- To make a fair comparison, we exclude the top 5% 
extreme square errors in the calculation of RMSE for Yao et al. (2005). 


each. Figure 12 shows the selected outlying curves (dashed lines) under sev¬ 
eral combinations of A and B. The background gray curves are from one 
simulated sample. The simulated curves become more outlying with the in¬ 
crease of either 1^41 or \B\. Following the procedure in Section 2.3, we locate 
the simulated outlying curves in the growth charts and screen out those 

5. Conclusion and discussion. This paper develops a new statistical 
method to construct growth charts for screening entire growth paths. By 




Fig. 12. (a), (b) The selected outlying curves under different combinations of A and B. 

The background gray curves are simulated curves from one Monte Carlo sample. 
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considering entire growth paths, the proposed growth charts bring more in¬ 
formative insights into monitoring pediatric growth. When our constructed 
growth chart is applied to the Finnish growth data for monitoring puberty 
growth, it shows more effective performance in detecting possible unusual 
growth patterns compared to existing growth charts. Besides pediatrics, our 
proposed method can also be applied to other areas, such as monitoring CD4 
lymphocyte counts of uninfected children born to HIV-l-infected women in 
HIV research, and helping determine the gene frequencies of the most com¬ 
mon mutations in the HFE gene in genetics research. 

outside the 95th percentile contours. We also screen each of the mea¬ 
surements from the simulated outlying curves using the conventional and 
conditional growth charts. Specifically, following the conventional growth 
chart from Wei et al. (2006), we estimate the 2.5th and 97.5th percentiles 
that are conditioned only on ages. And following the conditional growth 
chart from Wei et al. (2006), we estimate the same reference percentiles 
conditioned on both the ages and prior measurements. Using the conven¬ 
tional and conditional growth charts, we screen out the curves with more 
than one measurement outside the range between the corresponding 2.5th 
and 97.5th percentiles. Table 3 and Table 4 summarize the percentages of 
curves that are screened out by the growth charts, including both means 
and standard deviations over 20 Monte Carlo samples. The results illustrate 
that all three growth charts are effective in identifying outlying growth paths 
when both the location shift and slope deviation are very extreme {B = —20 
and A = —4). The conventional growth chart is most sensitive in screening 
out big location shifts (A = 0 and B = -20,-12,12,20). The conditional 
growth chart works the best for detecting dramatic slope deviations {B = 0 
and A = —4,—2,4,2). The proposed growth chart works the best for iden¬ 
tifying the unusual growth pattern combining moderate location shift and 
slope deviation {B = —4 and A = —2). Among the three growth charts, the 
proposed method has the most reasonable type I errors (the results when A 
and B are both 0) with mean 5.8% (9.8% for the conventional growth chart 
and 12.8% for the conditional growth chart). 

The proposed method also contributes to the statistical methodologies. 
First, it provides a new way to rank longitudinal/sparse functional data. 
It approximates the sparse and irregularly spaced functional data through 
PCA and represents each individual using the resulting components scores. 
Then the percentile rank of each individual can be identified by applying 
multivariate methods to components scores. Second, the proposed regres¬ 
sion based PCA algorithm provides a new way to conduct PCA for sparse 
functional data. As shown in Section 4.1, this algorithm is more computa¬ 
tionally stable than Yao, Muller and Wang (2005) by avoiding inverting the 
high-dimensional variance-covariance matrix. In terms of estimating compo¬ 
nent functions, the proposed method is comparable with the MLE method 


Table 3 

The means of the percentages of outlying curves Zi {t) that are screened out by the 95th percentile contours from the proposed growth 
chart, the 2.5th and 97.5 percentiles from the conventional growth chart, and the 2.5th and 97.5 percentiles from the conditional growth 

chart for different combinations of A (slope deviation) and B (location shift) 

Means of percentages: The proposed method/The conventional growth chart/The conditional growth chart 



B = -20 

B = -12 

B = -4 

B = 0 

B = 4 

B = 12 

B = 20 

A = -4 

100/100/100 

98.8/100/100 

95.9/98/99.3 

93.9/94.3/98.4 

91.3/89.2/96.9 

86.9/77.1/92.5 

87.6/83/84.9 

A = -2 

99.4/100/97.2 

94.9/96.8/88.9 

76.4/73.2/74.5 

63.9/51.7/65.6 

51.9/33.6/57.2 

45/46.4/42.9 

67.2/85.2/34.9 

A=-l 

98/99.2/82.7 

77.8/84.2/65 

40.2/39.2/45.3 

24.2/20.6/36.2 

18.9/15.8/28.6 

33.7/50/22.1 

73.8/90.3/24.4 

A = 0 

86.9/95.5/64.6 

46.5/62.8/44.3 

11.7/15.6/28.6 

5.9/9.4/12.8 

10.8/19.2/22.1 

47/65/24.6 

86.9/95.1/31.4 

A=1 

69.5/89/64 

28.1/48.2/52.1 

12.2/14.2/45.5 

16.7/22.8/44.5 

31.4/43.2/44.7 

74.4/86.1/48.4 

96/98.9/53.4 

A = 2 

60.6/83.7/78.8 

37/42.5/76.5 

39.5/34.3/75.8 

52.1/55/76.4 

ei.2/75.2/77.9 

91.6/97.2/80.7 

99/99.7/82.8 

A = 4: 

80.3/82.5/96.1 

81.3/76.8/96.4 

88/90.4/97.7 

91.9/95.3/98 

95.2/98.3/98.1 

99.1/99.8/98.6 

100/100/98.7 


Table 4 

The standard deviations of the percentages of outlying curves Zi{t) that are screened out by the 95th percentile contours from the 
proposed growth chart, the 2.5th and 97.5 percentiles from the conventional growth chart, and the 2.5th and 97.5 percentiles from the 
conditional growth chart for different combinations of A (slope deviation) and B (location shift) 

Standard deviations of percentages: The proposed method/The conventional growth chart/The conditional growth chart 



B = -20 

B = -12 

B = -4 

B = 0 

B = 4 

B = 12 

B = 20 

A = -4 

0.2/0/0 

1.1/0.2/0.2 

2.2/1.4/0.8 

3.1/2.6/1.4 

3.4/3.5/1.7 

3.9/5.2/2.6 

3.2/4.4/5.7 

A=-2 

0.8/0/2.4 

2.5/1.8/5.1 

6.1/7.6/5.8 

7.6/7.3/7 

7.5/5.9/7.3 

7.6/5.9/7.1 

5.9/4/8.3 

A = -l 

1.4/0.8/9.1 

6.4/4.5/8.9 

7.4/6.7/7.2 

7.1/5.4/6.3 

7.2/3.6/4.5 

7.7/5.7/6.1 

6.4/2.9/8.6 

A = Q 

4.5/1.8/8.6 

9.2/6/7.4 

4/4.1/4.7 

2.8/3.1/4.7 

3.7/4.5/4.3 

7.7/4.7/7.6 

5.7/2.1/11.9 

A = 1 

12/3/8.8 

9.8/6.8/8.1 

5.4/4/5.3 

5.5/4.4/4.6 

8.6/4.9/4.4 

9.8/4/9.7 

3.4/1.2/14.6 

A = 2 

14.9/3.4/6.9 

12.1/7/6 

10.3^.4/4.8 

11.8/5.3/4.5 

10.9/3.8/5.3 

4.3/1.4/7.8 

1.3/0.7/10.1 

A = 4 

6.5/3.8/1.7 

5.3/5.6/1.6 

4.2/2.6/1.9 

3.2/2/1.9 

2/1.2/1.9 

1.1/0.6/1.7 

0.2/0/1.7 


to 
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[Peng and Paul (2009)]. The difference between the proposed method and 
MLE methods is essentially the difference between least square regression 
and MLE estimator. However, the regression framework has its own ad¬ 
vantages over the likelihood approaches. Eor example, one can replace the 
mean regressions with robust regressions when the data are contaminated 
with outliers. In addition, with minor modifications, the proposed regres¬ 
sion based algorithm can also be used to conduct other types of functional 
decomposition such as singular value decomposition for functional data. By 
supporting various regression models and various decompositions, the pro¬ 
posed method can be extended to a rich family of lower dimension approxi¬ 
mations for sparse functional data. Incorporating covariates and conducting 
variable selections are also straightforward under the regression framework. 
Our PCA algorithm estimates the mean and component functions nonpara- 
metrically. If there are additional recourses indicating certain parametric 
forms are more suitable, the efficiency of our method can be further im¬ 
proved. 


SUPPLEMENTARY MATERIAL 

Supplement to “Regression based principal component analysis for sparse 
functional data with applications to screening growth paths” 

(DOL 10.1214/15-AOAS811SUPP; .zip). R programs for the proposed algo¬ 
rithm and an example of constructing the proposed growth chart. 
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